

# TDA2Px Performance

Piyali Goswami

#### **ABSTRACT**

This application report looks into the System-on-Chip (SoC) level performance characteristics of key usecases targeted for TDA2Px. This document discusses the data path infrastructure and parameters that manage the system level throughput. Different optimization techniques for optimum system performance are also described.

|    | Contents                                                                                  |    |
|----|-------------------------------------------------------------------------------------------|----|
| 1  | SoC Overview                                                                              |    |
| 2  | Camera Interface Subsystem (CAL)                                                          |    |
| 3  | Imaging Subsystem (ISS)                                                                   |    |
| 4  | EMIF EDMA Performance                                                                     |    |
| 5  | IVI Usecase Performance                                                                   |    |
| 6  | ADAS Usecase Performance                                                                  |    |
| 7  | References                                                                                | 30 |
|    | List of Figures                                                                           |    |
| 1  | TDA2Px Block Diagram                                                                      | ;  |
| 2  | Camera Subsystem Overview                                                                 | 4  |
| 3  | CAL Initiator Bandwidth Statcoll Measurement                                              | į  |
| 4  | CAL Initiator 4 Channel YUV422 BP Bandwidth Statcoll Measurement                          | (  |
| 5  | VIP Initiator 1 Channel YUV420 Bandwidth Statcoll Measurement @ 239 MHz VP_CLK            | 7  |
| 6  | Channel CAL + VIP Configuration for Surround View Applications                            | 8  |
| 7  | VIP Initiator 4 Channel YUV420 Bandwidth Statcoll Measurement @ 133 MHz VP_CLK            | 9  |
| 8  | CAL Bandwidth Along With Other Initiators With Adaptive MFLAG Setting                     | 10 |
| 9  | ISS Overview                                                                              | 1  |
| 10 | Single Pass ISP Sub Block Data Processing Flow                                            | 12 |
| 11 | ISS 1 Channel ISP Single Pass WDR Bandwidth                                               | 12 |
| 12 | Single Channel SIMCOP LDC Operation                                                       | 13 |
| 13 | Bilinear Interpolation Bandwidth Profile With a Valid Mesh Table at 532 MHz ISS Operation | 1  |
| 14 | Bi-Cubic Interpolation Bandwidth Profile With a Valid Mesh Table at 532 MHz ISS Operation | 1  |
| 15 | EDMA 2 TC ECC vs Non ECC Performance @ 532 MHz                                            | 18 |
| 16 | UC7 (Integrated Cockpit + Navi + Media + Radio)                                           | 20 |
| 17 | MPU (Standalone) OS Mimic Memory Copy Performance Bandwidth Plot                          | 2  |
| 18 | BB2D (Standalone) Graphics Mimic Performance Bandwidth Plot                               | 22 |
| 19 | DSS Standalone Bandwidth Profile for IVI Usecase Traffic                                  | 23 |
| 20 | VPE Standalone Bandwidth Profile for IVI Usecase                                          |    |
| 21 | IVA Standalone 1080p60 Decode Bandwidth Profile for IVI Usecase                           |    |
| 22 | TDA2Px Surround View System                                                               |    |
| 23 | DSS Standalone Bandwidth Profile for IVI Usecase Traffic                                  | 28 |
|    |                                                                                           |    |





| 1  | CAL PPI0 (4L) Capture Test Parameters                                 | 5  |
|----|-----------------------------------------------------------------------|----|
| 2  | CAL PPI0 (4L) 4 Channel Capture Test Parameters                       | 6  |
| 3  | Initiator Average Bandwidth at Which CAL Overflows With DDR @ 532 MHz | 10 |
| 4  | CAL MFLAG Setting Behaviors                                           | 10 |
| 5  | ISS ISP Performance and Efficiency at Different Operating Frequencies | 13 |
| 6  | Calculating the LDC Bandwidth Degradation Factor                      | 13 |
| 7  | LDC Parameters for Unity Mesh Table Based Performance Analysis        | 14 |
| 8  | SIMCOP LDC Bi-Linear Interpolation Performance for Unity Mesh         | 14 |
| 9  | SIMCOP LDC Bi-Cubic Interpolation Performance for Unity Mesh          | 14 |
| 10 | SIMCOP LDC Performance With a Valid Mesh Table                        | 14 |
| 11 | ISS Multi-Initiator Bandwidth Analysis                                | 16 |
| 12 | Impact of System Traffic on ISP Single Pass WDR Performance           | 16 |
| 13 | Impact of System Traffic on LDC Performance                           |    |
| 14 | EMIF FIFO Sizing Differences Between TDA2xx and TDA2Px                | 17 |
| 15 | TDA2Px EMIF Performance Analysis @ 532 MHz and @ 666 MHz              | 18 |
| 16 | TDA2Px EMIF Performance vs TDA2xx @ 532 MHz                           | 19 |
| 17 | IVI Usecases and Different Initiator Roles                            | 19 |
| 18 | Top Three Worst Case Bandwidth Requirements for IVI                   | 21 |
| 19 | BW Knobs to Make IVI UC7 Work on TDA2Px                               | 25 |
| 20 | Initiator Wise Break Down of IVI UC7 Validation                       | 25 |
| 21 | ADAS 6 Channel Surround View + CMS With ISP                           | 27 |
| 22 | BW Knobs to Make ADAS 6Ch SRV + ISP Work on TDA2Px                    | 29 |
| 23 | Initiator Wise Break Down of ADAS 6Ch SRV + ISP Validation            | 29 |
| 24 | ADAS 4 Channel SRV + ISP Expected Bandwidth Analysis                  | 30 |
| 25 | Initiator Wise Break Down of ADAS 4Ch SRV + ISP Validation            | 30 |

# **Trademarks**

OMAP is a trademark of Texas Instruments.

All other trademarks are the property of their respective owners.



www.ti.com SoC Overview

#### 1 SoC Overview

TDA2Px is a high-performance, automotive vision application device based on enhanced OMAP™ architecture integrated on a 28-nm technology.

The device block diagram is shown in Figure 1.



Copyright © 2018, Texas Instruments Incorporated

Figure 1. TDA2Px Block Diagram

For more information regarding the TDA2Px device, see the *TDA2Px SoC for Advanced Driver Assistance Systems (ADAS) Silicon Revision 1.0 Technical Reference Manual.* 

The following sections discuss the performance aspects of various new IP features added in the TDA2Px device versus TDA2x and critical usecase performance entitlement that the device achieves.



# 2 Camera Interface Subsystem (CAL)

This section looks performance aspects of the Camera Interface subsystem comprising of:

- Camera Adapter Layer (CAL)
- · CAL interfaces:
  - Two PPI interfaces to CSI-2 PHY

Figure 2 shows the Camera subsystem block diagram.



Figure 2. Camera Subsystem Overview

The following enhancements are present in the CAL subsystem in TDA2Px:

- Video Port
  - Extend VP1 to replicate as VP2, VP3 and VP4 (16 Bits per Cycle mode on VPORTs)
  - Extend the programming model of VP1 to map to the 3 new VPs without any new register addition
  - Automatically map VP1 -> CPORT1, VP2 -> CPORT2, VP3 -> CPORT3, VP4 -> CPORT4
  - Same Programming parameters applied on all 4 VP
  - New Input: EN\_BASELINE\_MODE 1bit: Baseline compatibility mode. Driven via Control Module MMR
  - Baseline mode = 1 -> CAL in baseline Mode ( 0 = 4 New VP mode )
- DMA
  - Support of a new mode that bifurcates YUV422 (16b format) input to 2 DMA output planes for Y and UV each (termed as YUV422 Bi-Planar mode).
  - The DMA needs to allocate 2x numbers for channels of x number of input streams.



# 2.1 CAL Standalone Performance

In the following subsections, performance aspects of different modes of operation of the CAL IP in TDA2Px are discussed. In all the experiments the following IP frequencies are applied:

CAL: 266 MHz

CSI PHY Control Clock: 96 MHz
L3 Interconnect: 266 MHz
EMIF Controller: 266 MHz
DDR3 Clock: 666 MHz

#### 2.1.1 CAL PPI0 (4L) Capture

In this mode, the CAL IP is configured for capture of a single channel of 1920x1080 frame @ 30 FPS. Pixels are configured to be 16 bits per pixel. The CAL Write DMA is configured for constant addressing mode and linear write pattern.

The CAL test is configured with the parameters shown in Table 1.

**Test Parameter** Value Capture Width (in Pixels) 1920 1080 Capture Height (in Pixels) Bits per pixel 16 Horizontal Blanking 10 Vertical Blanking 15 Data format 0x2A (Raw 8) Number of Lanes 4 Lanes

Table 1. CAL PPI0 (4L) Capture Test Parameters

Expected Bandwidth =  $1920 \times 1080 \times 30$  FPS x 2 Bytes per pixel = 124.416 MBps Measured Average Bandwidth = 128.439 MBps



Figure 3. CAL Initiator Bandwidth Statcoll Measurement



# 2.1.2 Channel CAL PPI0 (4L) Capture With YUV422 to YUV422BP

In this mode, the CAL IP is configured for capture of a 4 channels of 1920x1080 frame @ 30 FPS. Pixels are configured to be 16 bits per pixel. The CAL Write DMA is configured for constant addressing mode and linear write pattern. Write DMA Bi-planar mode is enabled to split the Y and UV planes.

Table 2. CAL PPI0 (4L) 4 Channel Capture Test Parameters

| Test Parameter             | Value        |
|----------------------------|--------------|
| Capture Width (in Pixels)  | 1920         |
| Capture Height (in Pixels) | 1080         |
| Bits per pixel             | 8            |
| Horizontal Blanking        | 10           |
| Vertical Blanking          | 15           |
| Data format                | 0x2A (Raw 8) |
| Number of Lanes            | 4 Lanes      |
| Number of Virtual Channels | 4            |

The pixel processing extraction is configured to 8 bits and the pixel packing is configured to 16 bits.

All the pixel processing contexts are utilized and all the write DMA contexts are utilized to generate 8 streams (4 channels each having Y and UV data).

Expected Bandwidth =  $1920 \times 1080 \times 30$  FPS x 2 Bytes per pixel x 4 channels = 497.664 MBps Measured Average Bandwidth = 520.976 MBps

Without the YUV422 to YUV422 BP conversion 4 WR\_DMA contexts are utilized with similar bandwidth profile.



Figure 4. CAL Initiator 4 Channel YUV422 BP Bandwidth Statcoll Measurement



#### 2.1.3 CAL to VIP 1 Channel Baseline Mode

When baseline mode is enabled, only one VPORT between CAL and VIP is enabled.

In this mode, VIP 1A has been configured to capture the input frames using active video signaling and convert the incoming YUV422 frames to YUV420. CAL upon capturing the frames from PPI0 will forward the frames to the first VPORT. In order for VIP to capture correctly, it is important to set the CAL CSI2 CTXx.LINES to the exact number of lines one wants to capture via VIP.

In case of erroneous frames received due to CSI packets being dropped due to errors in the transmission, the CAL to VIP behavior for capturing or dropping the received frames is as below:

- Small Frame Reception Case: If the expected frame size is 65x65 (set CAL\_CSI2\_CTXx .LINES = 65) and the received input frame is 64x64, no data written out by VIP.
- Correct Frame Reception Case: If the expected frame size is 64x64 (set CAL\_CSI2\_CTXx .LINES = 64) and the received input frame is 64x64, 64 lines written out. This corresponds to a normal working case.
- Large Frame Reception Case: If the expected frame size is 48x48 (set CAL\_CSI2\_CTXx .LINES = 48) and the received input frame is 64x64, only 48 of 64 lines each is written out. Extra data not written out.

The CAL + VIP is able to receive frames normally after short and long frames are received.

The CAL + VIP can operate at a maximum VP\_CLK frequency of 90% of VIP functional clock frequency. In TDA2Px, the VIP clock frequency is 266 MHz. Thus, 239 MHz is the maximum VPORT pixel clock frequency.

The expected bandwidth is this test is  $1920 \times 1080 \times 1.5$  Bytes per pixel x 30 fps = 93.312 MBps.

The observed bandwidth was  $VIP1_P1 + VIP1_P2 = 32.36 + 64.75 = 97.12 MBps$ .



Figure 5. VIP Initiator 1 Channel YUV420 Bandwidth Statcoll Measurement @ 239 MHz VP\_CLK



# 2.1.4 CAL to VIP 4 Channel Capture

In this configuration, the EN\_BASELINE\_MODE is set to 0 (default). This enables using all of the four VPORTs available in TDA2Px. CAL PPI0 (4L) is configured to capture four channels of 1920x1080 frames @ 30 FPS, 2 bytes per pixel. The CAL captured data is then sent to VIP 1A, 2A, 3A, 4A. VIP performs conversion from YUV422 to YUV420 for all 4 channels.

This configuration is useful in surround view applications that are targeted for TDA2Px with an external ISP as shown in Figure 6.



Figure 6. Channel CAL + VIP Configuration for Surround View Applications



The expected bandwidth is  $1920 \times 1080 \times 1.5$  Bytes per pixel x 30 fps x 4 Channels = 373.25 MBps.

The observed average bandwidth was VIP1\_P1 + VIP1\_P2 + VIP2\_P1 + VIP2\_P2 = 72.34 + 143.32 + 72.44 + 143.35 = 431.45 MBps.



Figure 7. VIP Initiator 4 Channel YUV420 Bandwidth Statcoll Measurement @ 133 MHz VP\_CLK

### 2.2 CAL Performance With Multiple Initiators

In this section, the CAL MFLAG behavior is analyzed when multiple initiators are executed in parallel with CAL, which causes the CAL FIFO to overflow and not be able to meet its real-time performance.

The CAL performance is further discussed in the context of planned automotive usecases for the TDA2Px device in Section 6.

Hard real time traffic can't be stalled for long periods of time. Indeed, the camera sends data at constant speed and it can only be stalled until FIFOs on the path are filled up. When FIFOs become full, data is discarded and the frame is therefore corrupted. To minimize the risk of real time data corruption, CAL IP supports the MFLAG based Quality of Service Mechanism.

Dynamic MFLAG generation is used when the write DMA operates on real time data. In that case, the MFLAG value depends on the number of slots ready to generate transactions in the write DMA (n):

- 00: SAFE (n< CAL CTRL.MFLAGL)</li>
- 01: VULNERABLE (CAL\_CTRL.MFLAGL <=n < CAL\_CTRL.MFLAGH)</li>
- 11: ENDANGERED (CAL\_CTRL.MFLAGH <=n)</li>

Software should ensure that:

- CAL\_CTRL.MFLAGL <= CAL\_CTRL.MFLAGH (only 0x00 or 0x11 generated when CAL\_CTRL.MFLAGL = CAL\_CTRL.MFLAGH)
- CAL CTRL.MFLAGL = 0x00, 0xFF or less or equal to 2^(WFIFO-3)
- CAL\_CTRL.MFLAGH = 0x00, 0xFF or less or equal to 2<sup>(WFIFO-3)</sup>

In this experiment, the following initiators were enabled along with the CAL capture of 4 channels of 1920x1080 Raw 12 bit @ 30 FPS.

DSS: 1 video pipe traffic of 1920x826 ARGB8888 @ 60 FPS.

DSP1 EDMA and DSP2 EDMA

The DDR frequency was set to 532 MHz.



Table 3 shows the measured bandwidth from the different initiators that causes the CAL overflows.

Table 3. Initiator Average Bandwidth at Which CAL Overflows With DDR @ 532 MHz

| Initiators | Average Bandwidth (MBps)    |
|------------|-----------------------------|
| CAL        | 747.31                      |
| DSP1_EDMA  | 2840.98                     |
| DSP2_EDMA  | 1850.67                     |
| DSS        | 377.81                      |
| IPU1       | 49.48                       |
| IPU2       | 17.31                       |
| EMIF1_SYS  | 2891.95                     |
| EMIF2_SYS  | 2891.59                     |
| Total EMIF | 5783.54 (67.9 % efficiency) |

With this initiator bandwidth profile, the CAL MFLAG behavior is as shown in Table 4.

**Table 4. CAL MFLAG Setting Behaviors** 

| CAL MFLAG Setting                       | CAL Overflow Behavior |
|-----------------------------------------|-----------------------|
| No CAL MFLAG                            | CAL Overflows         |
| Always on CAL MFLAG                     | No CAL Overflows      |
| Adaptive MFLAG (50% - 75% of 8KB WFIFO) | CAL Overflows         |
| Adaptive MFLAG (25% - 75% of 8KB WFIFO) | No CAL overflows      |

#### The following adaptive MFLAG setting is recommended for CAL overflow:

```
/* Set adaptive MFLAG for 25% to 75% of the WFIFO size (64 x 16 bytes).
    * WFIFO = 9 for TDA2PX.
    */
WR_FIELD_32(CAL_INST, CAL__CAL_CTRL, CAL__CAL_CTRL__MFLAGH, 0x30);
WR_FIELD_32(CAL_INST, CAL__CAL_CTRL, CAL__CAL_CTRL__MFLAGL, 0x10);
```

A snapshot of the bandwidth profile with this CAL MFLAG setting is as shown in Figure 8.



Figure 8. CAL Bandwidth Along With Other Initiators With Adaptive MFLAG Setting



# 3 Imaging Subsystem (ISS)

The imaging subsystem (ISS) (see Figure 9) deals with the processing of the pixel data coming from memory (image format encoding and decoding can be done to and from memory). With its subparts, such as interfaces and interconnects, image signal processor (ISP), and still image coprocessor (SIMCOP), the ISS is a key component for the following applications:

- Rear View Camera
- · Front View Stereo Camera
- Surround View Camera



Copyright © 2018, Texas Instruments Incorporated

Figure 9. ISS Overview

#### 3.1 ISS Standalone Performance

This section reviews the ISS standalone performance for different modes of operation planned for the ISS based usecases in TDA2Px. In all the experiments the following IP frequencies are applied:

ISS: 354 MHz (OPP\_NOM), 425 MHz (OPP\_OD), 532 MHz (OPP\_HIGH)

L3 Interconnect: 266 MHz
EMIF Controller: 266 MHz
DDR3 Clock: 666 MHz



# 3.1.1 ISP Memory to Memory Single Pass WDR

The ISP processing in memory-to-memory mode for a single pass wide dynamic range is shown Figure 10.



Figure 10. Single Pass ISP Sub Block Data Processing Flow

The expected bandwidth in this configuration is calculated as shown below:

- Input = 1920 x 1080 x 30 FPS x 12 bits per pixel / 8 = 93.312 MBps
- Output = 1920 x 1080 x 30 FPS x 1.5 Bytes per pixel = 93.312 MBps

In measurements with a FPS of 39.5, the average bandwidth was found to be 255.88 MBps. The bandwidth profile is shown in Figure 11.



Figure 11. ISS 1 Channel ISP Single Pass WDR Bandwidth

In order to further understand the impact of the frequency and maximum performance achieved by the ISP, multiple experiments were conducted for OPP\_NOM, OPP\_OD, OPP\_HIGH frequencies. The results were compared with the TDA3x ISP performance at 212.8 MHz operation. The ISP efficiency for single pass WDR was found to be approximately 94%.



| Table 5. ISS ISP Performance and Efficiency at Diff |
|-----------------------------------------------------|
|-----------------------------------------------------|

| ISS frequency<br>(MHz) | Freq Ratio | Single Pass<br>WDR Time (ms) | Single Pass<br>WDR Mpix/s | ISP Efficiency | Max Number of<br>1080p30<br>channels | Headroom (ms)<br>in 33 ms |
|------------------------|------------|------------------------------|---------------------------|----------------|--------------------------------------|---------------------------|
| 212.8                  | 0%         | 10.4                         | 199.38                    | 94%            | 3                                    | 1.8                       |
| 354                    | 66%        | 6.2                          | 334.45                    | 94%            | 5                                    | 2                         |
| 425.6                  | 100%       | 5.2                          | 398.77                    | 94%            | 6                                    | 1.8                       |
| 532                    | 150%       | 4.15                         | 499.66                    | 94%            | 7                                    | 3.95                      |

### 3.1.2 Simcop Memory-to-Memory Lens Distortion Correction

The lens distortion correction operation involves reading the input distorted frame based on a mesh table input that maps the input block to output blocks. The LDC read interface is used to read the input image and mesh table. The SIMCOP DMA is used to write out the output blocks to DDR. The data flow is shown in Figure 12.



Figure 12. Single Channel SIMCOP LDC Operation

The block size of the output frame determines the bandwidth degradation for the input LDC bandwidth. The calculation of the degradation factor can be understood with the example in Table 6.

Table 6. Calculating the LDC Bandwidth Degradation Factor



The following SIMCOP settings provide the best performance. In all of the SIMCOP based measurements, these settings have been applied. Application developers and ISS driver developers are recommended to set:

- SIMCOP DMA Burst Size = 8 x 128 Bit Burst
- SIMCOP DMA Tags = 0xF
- LDC Read Tags = 0xF



With these settings, the SIMCOP LDC performance was analyzed with a unity mesh table. The parameters shown in Table 7 were used for the analysis:

Table 7. LDC Parameters for Unity Mesh Table Based Performance Analysis

| LDC Parameter            | LDC Parameter Value  |
|--------------------------|----------------------|
| Input Width              | 1920                 |
| Input Height             | 1088                 |
| Output Width             | 1920                 |
| Output Width             | 1088                 |
| Output Block Width       | 64                   |
| Output Block Height      | 32                   |
| Pixel Pad                | 3                    |
| Interpolation            | Bi-Cubic / Bi-Linear |
| LDC LUT Downscale factor | 4                    |

The SIMCOP performance was analyzed for the different OPPs planned in TDA2Px for both Bi-Linear and Bi-cubic interpolation. As expected the Bi-cubic interpolation leads to approximately 50% SIMCOP utilization. Table 8 shows the detailed analysis for Bilinear Interpolation and Table 9 shows the Bicubic interpolation. These provide the best case SIMCOP LDC efficiency.

Table 8. SIMCOP LDC Bi-Linear Interpolation Performance for Unity Mesh

| ISS Frequency<br>(MHz) | Frequency Ratio | LDC Time (ms)<br>(Bi-linear) | Standalone LDC<br>(Unity Mesh)<br>Mpix/s (BiLinear) | Efficiency | Max Number of<br>1080p30<br>Channels | Headroom (ms)<br>in 33 ms |
|------------------------|-----------------|------------------------------|-----------------------------------------------------|------------|--------------------------------------|---------------------------|
| 212.8                  | 0%              | 10.00                        | 207.36                                              | 97%        | 3                                    | 3                         |
| 354                    | 66%             | 6.02                         | 344.22                                              | 97%        | 5                                    | 2.88                      |
| 425.6                  | 100%            | 5.03                         | 412.66                                              | 97%        | 6                                    | 2.85                      |
| 532                    | 150%            | 4.02                         | 516.33                                              | 97%        | 8                                    | 0.872                     |

Table 9. SIMCOP LDC Bi-Cubic Interpolation Performance for Unity Mesh

| ISS frequency<br>(MHz) | Freq Ratio | LDC Time (ms)<br>(Bicubic) | LDC Mpix/s<br>(Bicubic) | Efficiency | Max Number of<br>1080p30<br>Channels | Headroom (ms)<br>in 33 ms |
|------------------------|------------|----------------------------|-------------------------|------------|--------------------------------------|---------------------------|
| 212.8                  | 0%         | 19.82                      | 104.62                  | 49%        | 1                                    | 13.18                     |
| 354                    | 66%        | 11.9                       | 174.25                  | 49%        | 2                                    | 9.2                       |
| 425.6                  | 100%       | 10                         | 207.36                  | 49%        | 3                                    | 3                         |
| 532                    | 150%       | 8.85                       | 234.31                  | 44%        | 3                                    | 6.45                      |

Analysis was also performed with a valid mesh table. For the Valid mesh table analysis the output block width, block height and pixel pad were set to 32, 36 and 2 respectively. The impact in Table 10 is visible with Bi-linear interpolation where the efficiency drops from 97% to 93%.

Table 10. SIMCOP LDC Performance With a Valid Mesh Table

| Condition                      | ISS Frequency<br>(MHz) | LDC Time (ms) | LDC Mpix/s | Efficiency | Max Number<br>of 1080p30<br>Channels | Headroom<br>(ms) in 33 ms | Average<br>Bandwidth<br>(MBps) |
|--------------------------------|------------------------|---------------|------------|------------|--------------------------------------|---------------------------|--------------------------------|
| LDC Bi-linear<br>Interpolation | 532                    | 4.20          | 493.71     | 93%        | 7                                    | 3.6                       | 1917.45                        |
| LDC Bi-cubic Interpolation     | 532                    | 7.94          | 261.16     | 49%        | 4                                    | 1.24                      | 1049.98                        |



The bandwidth profile for a valid mesh table bi-linear interpolation is shown in Figure 13 and Figure 14.



Figure 13. Bilinear Interpolation Bandwidth Profile With a Valid Mesh Table at 532 MHz ISS Operation



Figure 14. Bi-Cubic Interpolation Bandwidth Profile With a Valid Mesh Table at 532 MHz ISS Operation



#### 3.1.3 **ISS Performance With Multiple Initiators**

In this section, the impact of other initiator traffic on the ISS performance is discussed when they are running simultaneously. The following additional initiators were enabled, as shown in Table 11.

Table 11. ISS Multi-Initiator Bandwidth Analysis

| Initiator | Initiator Bandwidth (MBps) | Remarks                         |
|-----------|----------------------------|---------------------------------|
| ISP       | 1488.47                    | RAW to YUV420                   |
| BB2D      | 1959.07                    | SGX 3D Synthesis (Output RGB24) |
| DSS       | 360.63                     |                                 |
| EDMAs     | 449.25                     | Loading EDMA traffic            |
| MPU       | 1088.25                    | A15 mem copy loading traffic.   |
| Total     | 5345.67                    |                                 |

The impact of the other initiators on the ISP Single Pass WDR performance was measured in terms of increase in the frame processing time, decrease in the efficiency and overall Mpix/second processing. The results are as shown in Table 12.

Table 12. Impact of System Traffic on ISP Single Pass WDR Performance

| Condition              | ISS Frequency<br>(MHz) | Single Pass<br>WDR Time (ms) | Single Pass<br>WDR Mpix/s | Efficiency | Max Number of<br>1080p30<br>Channels | Headroom (ms)<br>in 33 ms |
|------------------------|------------------------|------------------------------|---------------------------|------------|--------------------------------------|---------------------------|
| Standalone ISP         | 532                    | 4.15                         | 499.66                    | 94%        | 7                                    | 3.95                      |
| With System<br>Traffic | 532                    | 4.45                         | 465.98                    | 88%        | 7                                    | 1.85                      |

Lens distortion correction with Bi-linear interpolation was further added to this multi-initiator traffic to analyze the impact of system traffic on the LDC performance. It was found due to the 2D block access nature of the LDC read and Simcop DMA writes, there is significant impact of system traffic on LDC performance. Typical measures that can be used to mitigate these are:

- Place a Bandwidth regulator on the NRT2 port to give required priority to the LDC traffic.
- Place the mesh table data on the OCMC RAM of the device to reduce LDC traffic contention at the DDR.

NOTE: In the TDA2Px integration, the NRT1 and NRT2 ports have the same number of L3 switch hops. Thus unlike TDA3xx the swapping of NRT1 and NRT2 does not give any noticeable benefit in terms of bandwidth performance.



www.ti.com EMIF EDMA Performance

#### Table 13. Impact of System Traffic on LDC Performance

| Condition                                              | LDC Time (ms) (Bi-<br>linear) | LDC Mpix/s (Bi-<br>Linear) | Efficiency | Max Number of 1080p30 Channels | Headroom (ms) in 33 ms |
|--------------------------------------------------------|-------------------------------|----------------------------|------------|--------------------------------|------------------------|
| LDC Only (1)                                           | 4.20                          | 493.71                     | 93%        | 7                              | 3.6                    |
| LDC + Single Pass<br>WDR (2)                           | 4.36                          | 475.60                     | 89%        | 7                              | 2.48                   |
| With Other System Traffic (3)                          | 5.80                          | 357.52                     | 67%        | 5                              | 4                      |
| With Other System Traffic (BR on NRT2 = 1600 MBps) (4) | 5.40                          | 384.00                     | 72%        | 6                              | 0.6                    |

- (1) Standalone, no other initiators
- (2) SIMCOP + ISP Traffic
- (3) Just by adding LDC to SRV traffic. Overall DDR traffic = 5.217 GBps
- (4) Bandwidth regulator enabled on NRT2 port. Overall DDR traffic = 5.15 GBps. Swapping NRT1 and NRT2 does not make any difference (expected as the L3 switch levels are the same).

#### 4 EMIF EDMA Performance

The TDA2Px EMIF controller and DDR PHY have the following enhancements with respect to the TDA2xx EMIF controller and PHY:

- Support for 666 MHz DDR3 clock
- Optimized Command and Write Data FIFO sizing (see Table 14)
- ECC Read Modify write support

Table 14. EMIF FIFO Sizing Differences Between TDA2xx and TDA2Px

|                         | TDA                                 | 2xx                            | TDA2Px                            |                                |  |  |
|-------------------------|-------------------------------------|--------------------------------|-----------------------------------|--------------------------------|--|--|
| Parameter               | System Local Interface<br>Entries   | MPU Local Interface<br>Entries | System Local Interface<br>Entries | MPU Local Interface<br>Entries |  |  |
| Pre Command FIFO        | 6                                   | 4                              | 6                                 | 4                              |  |  |
| Command FIFO            | Up to 10                            | Up to 10                       | Up to 16                          | Up to 16                       |  |  |
| Pre Write FIFO          | 6                                   | 8                              | 10                                | 12                             |  |  |
| RMW FIFO                | NA                                  | NA                             | Up to 16                          | Up to 16                       |  |  |
| Write Data FIFO         | (256-bit) Up to (19 x 256 bits) + 6 | Up to 19 + 8                   | Up to (16 × 256 bits)             | Up to 16                       |  |  |
| Return Command FIFO     | 22                                  | 24                             | 22                                | 24                             |  |  |
| SDRAM Read Data FIFO    | 22                                  | 24                             | 22                                | 24                             |  |  |
| Register Read Data FIFO | 2                                   | 0                              | 2                                 | 0                              |  |  |
| RMW Read Data FIFO      | NA                                  | NA                             | Up to 16                          | Up to 16                       |  |  |



EMIF EDMA Performance www.ti.com

To understand the impact of the difference between TDA2xx and TDA2Px EMIF performance, a 2 TC EDMA transfer was performed.

The effect of ECC transfers versus non ECC transfers was analyzed and the results show that the performance between ECC and non ECC transfers is comparable.



Figure 15. EDMA 2 TC ECC vs Non ECC Performance @ 532 MHz

In another experiment to understand the impact of the frequency upgrade from 532 MHz to 666 MHz along with the FIFO sizing changes, the partial IVI usecase was run and additional EDMA load was run in parallel to keep the EMIF FIFOs fully occupied to analyze the EMIF behavior. More details regarding the IVI traffic are discussed in Section 5. As can be seen in Table 15, with a frequency scaling of 532 MHz to 666 MHz (approximately 25%), an equivalent performance gain 5486.26 MBps to 6863.22 MBps (approximately 25%) is achieved.

Table 15. TDA2Px EMIF Performance Analysis @ 532 MHz and @ 666 MHz

| Expt Name        | TDA2Px-DDR3 @ 532 MHz | TDA2Px-DDR3 @ 666 MHz |
|------------------|-----------------------|-----------------------|
| EMIF1_SYS (MBps) | 2045.97               | 2799.14               |
| EMIF2_SYS (MBps) | 2046.54               | 2790.17               |
| MA_MPU_P1 (MBps) | 696.62                | 634.85                |
| MA_MPU_P2 (MBps) | 697.13                | 639.06                |
| Total DDR BW     | 5486.26               | 6863.22               |
| Efficiency       | 64.5%                 | 64.4%                 |



IVI Usecase Performance www.ti.com

To stretch the TDA2Px device to its limit, the EDMA transfers were increased until DSS started underflowing. Since the IVI real time DSS traffic requirement is high, in this experiment the DSS would underflow due to L3 limits. This analysis gave insight into the maximum EMIF performance available by maximizing the EMIF FIFO usage. In this configuration, Table 16 gives the relative performance between TDA2Px EMIF and TDA2XX EMIF both operating at 532 MHz.

Table 16. TDA2Px EMIF Performance vs TDA2xx @ 532 MHz

| Expt Name        | TDA2Px-EMIF @ 532 MHz | TDA2xx EMIF @ 532 MHz |
|------------------|-----------------------|-----------------------|
| EMIF1_SYS (MBps) | 2496.503864           | 2002.71136            |
| EMIF2_SYS (MBps) | 2493.411011           | 1985.6016             |
| MA_MPU_P1 (MBps) | 396.9076276           | 720.65952             |
| MA_MPU_P2 (MBps) | 395.2099299           | 719.42016             |
| Total DDR BW     | 5782.032432           | 5428.39264            |
| Efficiency       | 67.9%                 | 63.8%                 |

#### 5 **IVI Usecase Performance**

In an automotive infotainment system, TDA2Px can be used as either the main processor in the head-unit (Integrated Head Unit) or co-processor (either the applications (HMI)-processor or a co-processor (for radio, audio)). Infotainment system requires a rich set-of high-level OS, high-resolution multi-display, camera input, navigation, speech, radio, multimedia, and connectivity support.

TDA2Px extends the target use cases for TDA2x. The list of planned usecases for IVI is given in Table 17.

Table 17. IVI Usecases and Different Initiator Roles

|                                                                                                                                                 | 2xA15                                                                                                   | GPU                                 | IPU2 + IVA-HD +<br>VPE Decode                                                                                     | IPU2 + IVA-HD<br>Encode                                                                                                                                         | C66x DSPs /<br>EVEs                                                                                                                  | IPU1      |
|-------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|-------------------------------------|-------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|-----------|
| UC1 (Highway<br>Driving + Dual-<br>Navi + Media):<br>Infotainment +<br>Cluster Info                                                             | HLOS, HMI, Nav<br>(main disp.) + Nav<br>(info disp.),<br>Connectivity, and<br>so forth.                 | HMI, Nav (maps<br>on both displays) | 1080i-60 decode<br>DTV w/external<br>decoder De-<br>Interlace 1x<br>1080i60 (for RSE<br>display)                  | 1080p30 encode<br>(for remote eAVB<br>display)                                                                                                                  | Audio mixing & routing                                                                                                               | CAN stack |
| UC2 (Highway<br>Driving +<br>Projection + Navi<br>+ Media):<br>Infotainment +<br>Cluster Info +<br>Multi- DAB Radio                             | HLOS, Projection<br>mode, HMI, Nav<br>(info disp.),<br>Connectivity, etc.                               | HMI, Nav                            | Phone projection<br>mode: 1080p30<br>decode                                                                       | 1080p30 encode<br>(for remote CE<br>device over WiFi)                                                                                                           | Multi DAB Radio<br>+ Audio mixing &<br>routing                                                                                       | CAN stack |
| UC3 (Street Driving + Navi + 2D Surround- view, or LDW/TSR/OD): Infotainment + Cluster Info + Multi- DAB Radio + Info ADAS or Driver Monitoring | HLOS, HMI, Nav,<br>Connectivity, and<br>so forth.                                                       | HMI, Nav                            | 1080i-60 decode<br>DTV w/external<br>decoder De-<br>Interlace 1x<br>1080i60 (for CE<br>device, or RSE<br>display) | 1080p30 encode<br>(for remote CE<br>device over WiFi),<br>OR Car black-box<br>encoding<br>(recording at least<br>2 cameras –<br>front/back: 720p<br>resolution) | InfoADAS (2D<br>SRV + OD/PD)<br>(or LDW + TSR +<br>OD/PD, or Driver<br>monitor.) + Multi<br>DAB Radio +<br>Audio mixing &<br>routing | CAN stack |
| UC4 (3D SRV w/<br>Park Assist):<br>Infotainment +<br>Cluster Info +<br>Dual DAB Radio                                                           | HLOS, HMI, 3D<br>SRV (main disp.)<br>+ Cluster Info<br>(info disp.),<br>Connectivity, and<br>so forthd. | HMI, 3D SRV<br>processing           |                                                                                                                   |                                                                                                                                                                 | 3D SRV + Park<br>Assist + Multi-<br>DAB Radio +<br>Audio mixing &<br>routing                                                         |           |

19



IVI Usecase Performance www.ti.com

Table 17. IVI Usecases and Different Initiator Roles (continued)

|                                                                                                                 | 2xA15                                                                                      | GPU                            | IPU2 + IVA-HD +<br>VPE Decode                                                                    | IPU2 + IVA-HD<br>Encode                                                                                                                                 | C66x DSPs /<br>EVEs                                         | IPU1      |
|-----------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|--------------------------------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|-----------|
| UC5 (Street<br>Driving w/ 3D<br>SRV + Navi +<br>Media):<br>Infotainment +<br>Cluster Info +<br>Multi- DAB Radio | HLOS, HMI, 3D<br>SRV (main disp.)<br>+ Nav (info disp.),<br>Connectivity, and<br>so forth. | HMI, 3D SRV<br>processing, Nav | 1080i-60 decode<br>DTV w/external<br>decoder De-<br>Interlace 1x<br>1080i60 (for RSE<br>display) | 1080p30 encode<br>(for remote eAVB<br>display) OR Car<br>black-box<br>encoding<br>(recording at least<br>2 cameras –<br>front/back: 720p<br>resolution) | 3D SRV + Multi-<br>DAB Radio +<br>Audio mixing &<br>routing | CAN stack |
| UC6 (Integrated<br>Cockpit + Navi +<br>Media):<br>Infotainment +<br>Digital Cluster                             | Hypervisor, IVI<br>HLOS, Cluster<br>OS, Nav,<br>Connectivity                               | HMI, Nav, &<br>Digital Cluster | 1080i-60 decode<br>DTV w/external<br>decoder De-<br>Interlace 1x<br>1080i60 (for RSE<br>display) | 1080p30 encode<br>(for remote eAVB<br>display)                                                                                                          | Audio mixing & routing                                      | CAN stack |
| UC7 (Integrated<br>Cockpit + Navi +<br>Media + Radio):<br>Infotainment +<br>Digital Cluster +<br>Dual DAB Radio | Hypervisor, IVI<br>HLOS, Cluster<br>OS, Nav,<br>Connectivity                               | HMI, Nav, &<br>Digital Cluster | 1080i-60 decode<br>DTV w/external<br>decoder De-<br>Interlace 1x<br>1080i60 (for RSE<br>display) | 1080p30 encode<br>(for remote eAVB<br>display)                                                                                                          | Dual DAB Radio +<br>Audio mixing &<br>routing               | CAN stack |

The UC7 IVI usecase is the heaviest with respect to bandwidth requirements. A summary of the data flow as a part of UC7 is as shown in Figure 16.



Copyright © 2018, Texas Instruments Incorporated

Figure 16. UC7 (Integrated Cockpit + Navi + Media + Radio)



www.ti.com IVI Usecase Performance

A summary of the top three worst case bandwidth IVI usecase requirements are listed in Table 18

|           | IVI UC Traffic Requirement |          |          |  |  |  |
|-----------|----------------------------|----------|----------|--|--|--|
| Initiator | UC5                        | UC6      | UC7      |  |  |  |
| MPU       | 678.8                      | 1280.8   | 1280.8   |  |  |  |
| GPU       | 2217.31                    | 2103.72  | 2103.72  |  |  |  |
| DSP       | 227.56                     | 2.54     | 61.67    |  |  |  |
| IVA       | 1348.43                    | 1348.43  | 1348.43  |  |  |  |
| CAL       | 165.69                     | 0        | 0        |  |  |  |
| DSS       | 1168.13                    | 1426.64  | 1426.64  |  |  |  |
| VPE       | 466.56                     | 466.56   | 466.56   |  |  |  |
| Misc      | 33.90858                   | 32.03222 | 33.03422 |  |  |  |
| Total     | 6306.389                   | 6660.722 | 6720.854 |  |  |  |

The following subsections discuss the different configurations used for each initiator to generate the overall IVI UC7.

### 5.1 MPU CPU Traffic

The Cortex A15 is responsible for generating OS traffic, TV Bit stream Out, Modem/WiFi data Out, TV AAC stream In, BT SCO Audio, Microphone Audio, Filesystem transfers in the UC7. In order to mimic this, A15 was configured to perform memory copy transfers with MPU I-Cache and D-Cache enabled. Hardware default A15 configurations were used to get optimal performance.

In the standalone mode, the MPU was found to generate a total average bandwidth of 3011.5 MBps.



Figure 17. MPU (Standalone) OS Mimic Memory Copy Performance Bandwidth Plot



IVI Usecase Performance www.ti.com

### 5.2 Graphics Processing Traffic

The GPU is responsible for generating the HMI, Navigation and the 3D cluster graphics as a part of IVI UC7. GPU programming is tightly coupled with Linux OS. In the non-OS based test, the GPU traffic was mimicked by the BB2D 2D graphics engine which has a relatively similar access pattern compared to other initiators and also connects to the same L3 switch fabric as the GPU. The BB2D was configured to perform a 4 layer 1080p YUV420 overlay.

With the BB2D processing such frames back to back the total average bandwidth was measured to be 3306 MBps in the standalone mode.



Figure 18. BB2D (Standalone) Graphics Mimic Performance Bandwidth Plot

#### 5.3 Display Traffic

The display traffic as a part of UC7 is listed below:

- One Video output (1920x1080 @ 60 FPS) with the blend of the following three:
  - DSS read of HMI (Keyboard layer) Buf (Display1) @ 60fps (1920x826) 4 BPP
  - DSS read of Navi Layer1 Buf (Display1) @ 60fps (1920x543) 4 BPP
  - DSS read of Navi Layer2 Buf (Display1) @ 60fps (1920x1007) 4 BPP
- Second Video output (1920x720 @ 60 FPS) with following:
  - DSS read of 3DCluster (1920x720) for Display2 @ 60 fps- 4 BPP

In order for such high real-time display traffic to be supported without sync losses and underflows, the following settings are recommended:

- BURSTSIZE = 8 x 128-bit bursts
- BUFPRELOAD = 1 (Hardware pre-fetches pixels up to high threshold value)

With the above settings in place the DSS was able to achieve an average bandwidth of 1445.56 MBps without any underflows and sync losses.



www.ti.com IVI Usecase Performance

The DSS bandwidth profile for this configuration is shown in Figure 19.



Figure 19. DSS Standalone Bandwidth Profile for IVI Usecase Traffic

# 5.4 VPE Processing Traffic

The Video Processing engine is responsible for de-interlacing 1080i decoded streams of YUV420 to generate progressive 1080p YUV420 stream at 30 FPS.

In this operation the VPE traffic in standalone mode was found to be on an average 485.69 MBps with peak traffic of 2194 MBps. In the sequent sections we will see how placing a BW limiter on the VPE ports will enable the DSS traffic to not underflow when all the initiators are executed together.

The VPE standalone BW profile is as shown in Figure 20.



Figure 20. VPE Standalone Bandwidth Profile for IVI Usecase



IVI Usecase Performance www.ti.com

#### 5.5 IVAHD Decode Traffic

The IVI UC7 requires decoding 2 1080i60 TV streams and encoding one 1080p30 encode. In order to mimic this traffic a standalone codec decoder client application was run to decode a continuously looping I, P, B and B frames. The choice of the decoded sequence was such that the IVA bandwidth requirement was close to requirement.

With the stream HD\_CR\_KyuRyu.264, the IVA average bandwidth for decode was found to be 1008.2 MBps. The bandwidth profile is as shown in Figure 21.



Figure 21. IVA Standalone 1080p60 Decode Bandwidth Profile for IVI Usecase



www.ti.com IVI Usecase Performance

# 5.6 IVI Usecase Integrated Bandwidth

Once the different initiators from the different sub-experiments were run together with DDR set at 666 MHz, multiple bandwidth knobs had to be employed to ensure the IVI usecase requirements are met with no DSS underflows and in time processing of the video frames. The step-by-step application of bandwidth knobs enable understanding where the system level bottlenecks reside.

Table 19. BW Knobs to Make IVI UC7 Work on TDA2Px

| Expt Name         | DDR3 - 666 MHz (DSS Adaptive MFLAG + BB2D BR + No Extra EDMA + max sys = 12 + A15 lower priority) |  |  |  |
|-------------------|---------------------------------------------------------------------------------------------------|--|--|--|
| OCP_ CONFIG       | 0x0C500000                                                                                        |  |  |  |
| EMIF1_SYS (MBps)  | 2799.14                                                                                           |  |  |  |
| EMIF2_SYS (MBps)  | 2790.17                                                                                           |  |  |  |
| MA_MPU_P1 (MBps)  | 634.85                                                                                            |  |  |  |
| MA_MPU_P2 (MBps)  | 639.06                                                                                            |  |  |  |
| Total Avg. DDR BW | 6863.21                                                                                           |  |  |  |
| Avg. Efficiency   | 64.4%                                                                                             |  |  |  |
| Remarks           | 1. No DSS Underflows.                                                                             |  |  |  |
|                   | 2. IVA 1080p60, no drops.                                                                         |  |  |  |
|                   | 3. BB2D higher BW (2.6GBps versus 2.1 GBps requirement)                                           |  |  |  |
|                   | 4. MPU bandwidth meeting requirement.                                                             |  |  |  |
| BW Knobs Used     | 1. VPE BW Limiter to both ports (700 MBps).                                                       |  |  |  |
|                   | 2. DSS Adaptive MFLAG (50%-75% thresholds) + DSS Priority Highest                                 |  |  |  |
|                   | 3. BB2D BW Regulator 1000 MBps.                                                                   |  |  |  |
|                   | 4. Sys Threshold kept to 12.                                                                      |  |  |  |
|                   | 5. MPU_MA_PRIORITY = 6                                                                            |  |  |  |

With respect to the requirement the initiator wise traffic is as given in Table 20.

Table 20. Initiator Wise Break Down of IVI UC7 Validation

| Initiator     | UC7 Requirement From UCAD (MBPS) | Test Traffic (MBPS) | Remarks                                                                                  |
|---------------|----------------------------------|---------------------|------------------------------------------------------------------------------------------|
| MPU           | 1280.8                           | 1273.9              | Just short, with realistic GPU traffic, should adjust.                                   |
| GPU           | 2103.72                          | 2612.94             | Mimic via BB2D 1080p 4 layer overlay. Higher BW likely to absorb DSP and IVA difference. |
| DSP           | 61.67                            | -                   |                                                                                          |
| IVA           | 1348.43                          | 962.08              | Mimic via IPBB 1080p60 decode. (Stream: HD_CR_KyuRyu.264)                                |
| CAL           | 0                                | -                   |                                                                                          |
| DSS           | 1426.64                          | 1477.33             | Matched VID/GFX display dimensions, format and pixel clock rate.                         |
| VPE           | 466.56                           | 510.19              |                                                                                          |
| Miscellaneous | 33.03422                         | 57.94               | IPU CPU traffic                                                                          |
| Total         | 6720.854                         | 6863.218            | Approximately 140 MBps higher traffic than requirement                                   |



#### 6 ADAS Usecase Performance

TDA2Px extends the following target use cases:

 ADAS 6-8 Ch smart surround view + CMS – Capture: 6-8 Ch, 2-3ch HD display (2 x 2MP@30/60fps), SRV processing: GPU @665 MHz, Analytics, A15 @500 MHz, IPU1@ 212 MHz, 2x DSPs, and 2x EVEs. Function: N-view, 3D stitch view for rendering, CMS, Rear View.

The ADAS 6 Channel Surround view + ISP based system description is as shown in Figure 22. This is primary usecase targeted for ADAS.



Copyright © 2018, Texas Instruments Incorporated

Figure 22. TDA2Px Surround View System



The expected bandwidth analysis of the usecase corresponding to 6 channel input and processing is shown in Table 21.

Table 21. ADAS 6 Channel Surround View + CMS With ISP

| Operation                             | Туре  | IP      | Н    | V    | BPP | FPS | СН | BW (MB/S) |
|---------------------------------------|-------|---------|------|------|-----|-----|----|-----------|
| 4x2 MP@30fps capture for SV (WR) RAW  | WR    | CAL     | 1920 | 1080 | 2   | 30  | 4  | 497.66    |
| ISP read of all SV Channels           | RD    | ISP     | 1920 | 1080 | 2   | 30  | 4  | 497.66    |
| 4x 2MP 30fps capture for SV (WR)      | WR    | ISP     | 1920 | 1080 | 1.5 | 30  | 4  | 373.25    |
| 2x1 MP@60fps capture for CMS (WR) RAW | WR    | VIP     | 1280 | 720  | 2   | 60  | 2  | 221.18    |
| ISP read of all SV Channels           | RD    | ISP     | 1280 | 720  | 2   | 60  | 2  | 221.18    |
| 4x 2MP 30fps capture for SV (WR)      | WR    | ISP     | 1280 | 720  | 1.5 | 60  | 2  | 165.89    |
| SGX 3D Synthesis (Output<br>RGB24)    | RD/WR | SGX     | 1920 | 1080 | 3   | 30  | 1  | 1740.00   |
| Display RGB24                         | RD    | DSS     | 1920 | 1080 | 3   | 30  | 1  | 186.62    |
| CMS O/P ( 2 displays)                 | RD    | DSS     | 1280 | 720  | 1.5 | 60  | 2  | 165.89    |
| Deflicker for 2x1MP@60 CMS cameras    | RD/WR | IVA+DSP |      |      |     |     |    | 1000.00   |
| Analytics for 2MP, 3 camera@10fps     | RD/WR |         |      |      |     |     |    | 1800.00   |
| Total                                 |       |         |      |      |     |     |    | 6869.34   |

#### Other variations of the ADAS usecases include:

- ADAS\_4CH\_SRV + ISS: Concurrent execution of the following initiators:
  - CAL\_4\_CHANNEL\_RADAR\_CAPTURE
  - DSS\_3VID\_3VENC
  - GC320 traffic to mimic SGX traffic
  - ISS\_4CH\_ISP\_PROCESSING
  - DSP1 and DSP2 EDMA, EVE EDMA
- ADAS\_7CH\_SRV\_CMS\_UC: Concurrent execution of the following initiators:
  - CAL\_VIP\_7\_CHANNEL\_CAPTURE
  - DSS\_3VID\_3VENC
  - GC320 traffic to mimic SGX traffic
  - DSP1 and DSP2 EDMA, EVE EDMA
- ADAS\_8CH\_SRV\_CMS\_UC: Concurrent execution of the following initiators:
  - CAL\_VIP\_8\_CHANNEL\_CAPTURE
  - DSS\_3VID\_3VENC
  - GC320 traffic to mimic SGX traffic
  - DSP1 and DSP2 EDMA, EVE EDMA



# 6.1 Display Traffic

The display traffic as a part of the ADAS usecase is as listed below:

- One display with SV Output RGB888: 1920x1080 @ 3 bpp @ 30 FPS
- Second display CMS Output 2 Channels: 1280x720 @ 1.5 bpp @ 60 FPS

With YUV420, the DISPC Scaler is enabled automatically (the Scaler is used to convert YUV420 to YUV444 before YUV2RGB conversion). Scaler will request multiple lines at the start from the DMA (to prefill the Scaler line buffers), even when the display is in blanking state. This can cause DMA to underflow without any Display sync-lost. This is a harmless condition. To avoid it you can force the DMA to pre-fetch up to the high threshold value (set DISPC\_VID3\_ATTRIBUTES.BUFPRELOAD to '1').

With the above settings in place the DSS was able to achieve an average bandwidth of 363.87 MBps without any underflows and sync losses.

The DSS bandwidth profile for this configuration is shown in Figure 23.



Figure 23. DSS Standalone Bandwidth Profile for IVI Usecase Traffic



# 6.2 ADAS 6 Channel + ISP Integrated Bandwidth

Once the different initiators from the different sub-experiments were run together with DDR set at 666 MHz, multiple bandwidth knobs had to be employed to ensure the ADAS usecase requirements are met with no DSS underflows and in time processing of the frames. The step-by-step application of bandwidth knobs enable understanding where the system level bottlenecks reside.

Table 22. BW Knobs to Make ADAS 6Ch SRV + ISP Work on TDA2Px

| Expt Name                 | DDR3 - 666 MHz (ISP + DSP1 + EVE + BL & BR on BB2D + MPU Lower)             |  |  |  |
|---------------------------|-----------------------------------------------------------------------------|--|--|--|
| OCP_ CONFIG               | 0xC500000                                                                   |  |  |  |
| Avg. EMIF1_SYS (MBps)     | 3587.231926                                                                 |  |  |  |
| Avg. EMIF2_SYS (MBps)     | 3589.380462                                                                 |  |  |  |
| Avg. MA_MPU_P1 (MBps)     | 208.1853929                                                                 |  |  |  |
| Avg. MA_MPU_P2 (MBps)     | 207.546379                                                                  |  |  |  |
| Total Average DDR BW      | 7592.34                                                                     |  |  |  |
| Avg. Efficiency           | 71.2%                                                                       |  |  |  |
| Peak. Total DDR BW (MBps) | 7907.84                                                                     |  |  |  |
| Peak Efficiency           | 74.2%                                                                       |  |  |  |
| Remarks                   | 1. DSS not underflowing (YUV420)                                            |  |  |  |
|                           | 2. CAL no overflows.                                                        |  |  |  |
|                           | 3. DMA traffic block based (128x128) - 3 GBps.                              |  |  |  |
|                           | 4. ISP 6 Channel Single Pass WDR completes in time.                         |  |  |  |
|                           | 5. BB2D Traffic just meeting requirement. (1.67 GBps)                       |  |  |  |
| BW Knobs Used             | 1. DSS Adaptive MFLAG (50%-75% thresholds) + High Priority + BUFPRELOAD = 1 |  |  |  |
|                           | 2. BB2D BW Limiter 1100 MBps and BB2D BW Regulator 900 MBps                 |  |  |  |
|                           | 3. MPU_MA_PRIORITY = 6                                                      |  |  |  |
|                           | 4. CAL Adaptive MFLAG (25%-75% thresholds)                                  |  |  |  |
|                           | 5. Sys Threshold = 12                                                       |  |  |  |

The DMA traffic was tuned further to be block based and further ISP traffic was added. With further settings of DSS BufPreload = 1, DSS dynamic MFLAG and priority, EMIF OCP\_CONFIG, and appropriate MPU priority the ADAS usecase goals were met.

With respect to the requirement, the initiator wise traffic is given in Table 23.

Table 23. Initiator Wise Break Down of ADAS 6Ch SRV + ISP Validation

| Initiator | Expected BW (MB/S) | Test Case BW | Remarks                                                                                                        |
|-----------|--------------------|--------------|----------------------------------------------------------------------------------------------------------------|
| CAL       | 718.85             | 749.15       | 6x2 MP@30fps capture for SV (WR) RAW                                                                           |
| ISP       | 1257.98            | 1279.47      | RAW to YUV420                                                                                                  |
| GPU       | 1740.00            | 1666.27      | SGX 3D Synthesis (Output RGB24) - Slightly lower bandwidth can be traded off with EDMA traffic in real system. |
| DSS       | 352.51             | 371.20       | YUV 420 BUFPRELOAD = 1                                                                                         |
| EDMAs     | 2800               | 3010.56      | Approximately 200 MBps higher than required                                                                    |
| MPU       | -                  | 415.73       | Extra traffic in the system                                                                                    |
| Total     | 6869.34            | 7492.38      | Approximately 623.03 MB higher traffic than requirement                                                        |



# 6.3 ADAS 4 Channel SRV + ISP Integrated Bandwidth

With the settings found during the 6 Channel ADAS SRV + ISP bandwidth analysis the 4 Channel ADAS SRV + ISP was analyzed. The expected bandwidth in this configuration is given in Table 24.

Table 24. ADAS 4 Channel SRV + ISP Expected Bandwidth Analysis

| Operation                               | Туре  | IP  | Н    | ٧    | BPP | FPS | СН | BW (MB/S) |
|-----------------------------------------|-------|-----|------|------|-----|-----|----|-----------|
| 4x2 MP@30fps capture<br>for SV (WR) RAW | WR    | CAL | 1920 | 1080 | 2   | 30  | 4  | 497.66    |
| ISP read of all SV<br>Channels          | RD    | ISP | 1920 | 1080 | 2   | 30  | 4  | 497.66    |
| 4x 2MP 30fps capture for SV (WR)        | WR    | ISP | 1920 | 1080 | 1.5 | 30  | 4  | 373.25    |
| SGX 3D Synthesis<br>(Output RGB24)      | RD/WR | SGX | 1920 | 1080 | 3   | 30  | 1  | 1740.00   |
| Display RGB24                           | RD    | DSS | 1920 | 1080 | 3   | 30  | 1  | 186.62    |
| Analytics for 2MP, 3 camera@10fps       | RD/WR |     |      |      |     |     |    | 1800.00   |
| Total                                   |       |     |      |      |     |     |    | 5095.20   |

With no CAL or VIP overflows and no DSS underflows the initiator wise break down of bandwidth is as shown in Table 25.

Table 25. Initiator Wise Break Down of ADAS 4Ch SRV + ISP Validation

| Initiator | Expected BW (MB/S) | Test Case BW<br>(MB/s) | Remarks                                                                     |
|-----------|--------------------|------------------------|-----------------------------------------------------------------------------|
| Capture   | 497.66             | 436.33                 | 4x2 MP@30fps capture for SV (WR) 12 bit packed data captured at higher FPS. |
| ISP       | 870.91             | 874.72                 | RAW to YUV420                                                               |
| GPU       | 1740.00            | 1762.65                | SGX 3D synthesis (Output RGB24)                                             |
| DSS       | 186.62             | 371.20                 | 6 channel DSS configuration used. Higher than required bandwidth            |
| EDMAs     | 1800.00            | 3316.82                | Approximately 1.5 GBps higher than required                                 |
| MPU       | -                  | 510.19                 | Extra traffic in the system                                                 |
| Total     | 5095.20            | 7271.91                | Approximately 2.17 GBps higher traffic than requirement                     |

#### 7 References

 TDA2Px SoC for Advanced Driver Assistance Systems (ADAS) Silicon Revision 1.0 Technical Reference Manual



www.ti.com Revision History

# **Revision History**

NOTE: Page numbers for previous revisions may differ from page numbers in the current version.

| CI | hanges from Original (April 2018) to A Revision | Pag | e |
|----|-------------------------------------------------|-----|---|
| •  | Update was made in Section 1.                   |     | 3 |
| •  | Update was made in Section 2.1.4                |     | 8 |
| •  | Updates were made in Section 6.                 | 2   | 6 |

#### IMPORTANT NOTICE AND DISCLAIMER

TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCE DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES "AS IS" AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD PARTY INTELLECTUAL PROPERTY RIGHTS.

These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriate TI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicable standards, and any other safety, security, or other requirements. These resources are subject to change without notice. TI grants you permission to use these resources only for development of an application that uses the TI products described in the resource. Other reproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third party intellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims, damages, costs, losses, and liabilities arising out of your use of these resources.

TI's products are provided subject to TI's Terms of Sale (<a href="www.ti.com/legal/termsofsale.html">www.ti.com/legal/termsofsale.html</a>) or other applicable terms available either on ti.com or provided in conjunction with such TI products. TI's provision of these resources does not expand or otherwise alter TI's applicable warranties or warranty disclaimers for TI products.

Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265 Copyright © 2018, Texas Instruments Incorporated